2025.09.19 | 跨平台GUI模型刷榜;FlowRL分布匹配提推理
Description
本期的 15 篇论文如下:
[00:26 ] 🖥 ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data(ScaleCUA:基于跨平台数据的开源计算机智能体规模化方案)
[01:01 ] 🌊 FlowRL: Matching Reward Distributions for LLM Reasoning(FlowRL:通过流匹配奖励分布提升大语言模型推理能力)
[01:57 ] 🧭 Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration(跨越边界推理:借助测试时深思提升规范对齐)
[02:55 ] 🧬 Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation(无需标签即可让语言模型自我进化:多数选择驱动,新颖性促进变异)
[03:34 ] 🎨 Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation(先理解再生成:面向自回归图像生成的自引导训练)
[04:12 ] 🔍 FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning(FinSearchComp:迈向真实专家级金融搜索与推理评测)
[04:56 ] 🤖 RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation(RynnVLA-001:利用人类示范提升机器人操作能力)
[05:39 ] 🔮 AToken: A Unified Tokenizer for Vision(AToken:面向视觉的统一Tokenizer)
[06:10 ] 🌌 WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance(WorldForge:无需训练即可在视频扩散模型中解锁3D/4D生成的涌现能力)
[06:58 ] 🖼 MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks(MultiEdit:面向多样复杂任务的指令式图像编辑新突破)
[07:54 ] 🎮 RecoWorld: Building Simulated Environments for Agentic Recommender Systems(RecoWorld:为智能推荐系统打造仿真训练沙盒)
[08:28 ] 🎯 Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding(释放多模态大模型零样本时空视频定位潜能)
[09:03 ] 🔍 Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs(留意空格:面向LLM选择题问答的Tokenization再审视)
[09:51 ] 🩺 EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence(EchoVLM:面向通用超声智能的动态混合专家视觉-语言模型)
[10:34 ] 🛰 FSG-Net: Frequency-Spatial Synergistic Gated Network for High-Resolution Remote Sensing Change Detection(FSG-Net:频-空协同门控网络用于高分辨率遥感变化检测)
<figure>
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递